Comparing Value-Function Estimation Algorithms in Undiscounted Problems

نویسنده

  • Ferenc Beleznay
چکیده

We compare scaling properties of several value-function estimation algorithms. In particular, we prove that Q-learning can scale exponentially slowly with the number of states. We identify the reasons of the slow convergence and show that both TD( ) and learning with a xed learning-rate enjoy rather fast convergence, just like the model-based method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A K - step look - ahead analysis of Value Iteration algorithms for Markov decision processes

We introduce and analyze a general look-ahead approach for Value Iteration Algorithms used in solving Lroth discounted and undiscounted Markov decision processes. This approach, based on the value-oriented concept interwoven with multiple adaptive relaxation factors, leads to accelcrating proccdures rvhich perform better than the separate use of either the concept of vaiue oriented or of relaxa...

متن کامل

Estimation of LOS Rates for Target Tracking Problems using EKF and UKF Algorithms- a Comparative Study

One of the most important problem in target tracking is Line Of Sight (LOS) rate estimation for using from PN (proportional navigation) guidance law. This paper deals on estimation of position and LOS rates of target with respect to the pursuer from available noisy RF seeker and tracker measurements. Due to many important for exact estimation on tracking problems must target position and Line O...

متن کامل

The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems

This paper considers undiscounted Markov Decision Problems. For the general multichain case, we obtain necessary and sufficient conditions which guarantee that the maximal total expected reward for a planning horizon of n epochs minus n times the long run average expected reward has a finite limit as n -* oo for each initial state and each final reward vector. In addition, we obtain a character...

متن کامل

Regular Policies in Abstract Dynamic Programming

We consider challenging dynamic programming models where the associated Bellman equation, and the value and policy iteration algorithms commonly exhibit complex and even pathological behavior. Our analysis is based on the new notion of regular policies. These are policies that are well-behaved with respect to value and policy iteration, and are patterned after proper policies, which are central...

متن کامل

Affine Monotonic and Risk-Sensitive Models in Dynamic Programming

In this paper we consider a broad class of infinite horizon discrete-time optimal control models that involve a nonnegative cost function and an affine mapping in their dynamic programming equation. They include as special cases classical models such as stochastic undiscounted nonnegative cost problems, stochastic multiplicative cost problems, and risk-sensitive problems with exponential cost. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999